Friday, 18th of October (2018)
Flannick & Florez (2016)
Marullo, El-Sayed Moustafa, & Prokopenko (2014)
Scott et al. (2012)
Yaghootkar & Frayling (2013)
"In all cases, the glucose-raising allele was associated with increased risk of T2D, yet fasting glucose effect sizes and T2D ORs were weakly correlated"
Scott et al. (2012)
To identify biomarker relevant to a disease
To identify the effect of a treatment on a disease
(independently of the association between the disease and the biomarker)
For example:
(Extended) Cox model: \[\begin{align} \lambda_i(t)=\lambda_0(t) \exp(\beta Y_i(t) + \alpha Z_i + \eta W_i) \end{align}\]
Where:
(Generalised) linear mixed effect model: \[Y_{i}(t_{ij})=X_{i}(t_{ij})+\epsilon_{i}(t_{ij})\]
where:
Test simultaneously an effect on:
A gain in statistical power to detect those effects
The standard (joint likelihood) formulation involves two components:
With:
(Generalised) linear mixed effect model: \[Y_{i}(t_{ij})=X_{i}(t_{ij})+\epsilon_{i}(t_{ij})\]
\(X_{ij}\) is the trajectory function, and could be defined: \[\begin{gather}X_{i}(t_{ij})=\theta_{0i} + \theta_{1i}t_{ij} + \cdots + \theta_{pi}t_{ij}^p &, & \boldsymbol\theta_p \sim \mathcal{N}(\boldsymbol\mu_, \boldsymbol\Sigma)\end{gather}\]
For simplicity here, we assume linearity over time (\(\theta_{0i}+\theta_{1i}t_{ij}\)): \[\begin{gather}Y_{i}(t_{ij})=\theta_{0i}+\theta_{1i}t_{ij}+\gamma Z_i+\delta W_i+\epsilon_{ij} &, & \boldsymbol{\theta} \sim \mathcal{N}_2 (\boldsymbol{\mu},\boldsymbol{\Sigma})\end{gather}\]
(Generalised) linear mixed effect model: \[Y_{i}(t_{ij})=\theta_{0i}+\theta_{1i}t_{ij}+\gamma Z_i+\delta W_i+\epsilon_{ij}\]
With:
(Extended) Cox model (proportional hazards): \[\begin{align} \lambda_i(t)&=\lim_{dt \to 0} \frac{P\{t\leq T_i<t+dt|T_i\geq t, \bar{Y_i}(t), Z_i, W_i\}}{dt}\\ &=\lambda_0(t) \exp\{\beta X_{i}(t) + \alpha Z_i + \eta W_i\} \end{align}\]
With:
Likelihood Ratio Test
\[LRT=-2\{\ell(\hat{\theta}_0)-\ell(\hat{\theta})\}\]
Wald Test
\[\begin{gather}
W=(\hat{\theta}-\theta_0)^\top \mathcal{I}(\hat{\theta})(\hat{\theta}-\theta_0)\\
\left(\text{Univariate: }(\hat{\theta}_j-\theta_{0j})/\widehat{\text{s.e.}}(\hat{\theta}_j)\right)
\end{gather}\]
Score Test
\[U=S^\top(\hat{\theta}_0)\{\mathcal{I}(\hat{\theta}_0)\}^{-1}S(\hat{\theta}_0)\]
What if, we split the job in two?
\(\Rightarrow\) "Two-Step"? (Tsiatis, DeGruttola, & Wulfsohn, 1995)
(Generalised) linear mixed effect model
\[\begin{align}
Y_{i}(t)&=X_i(t)+ \epsilon_{i}(t)\\
X^*_{i}(t)&=E\{X_{i}(t)|\bar{Y_i}(t), T_i\geq t\}
\end{align}\]
(Extended) Cox model (proportional hazards)
\[h_i(t)=h_0(t) \exp\{\beta X^*_{i}(t)\}\]
Let's keep it simple, i.e., without covariates:
the trajectory: \(Y_{i}(t)=\theta_{0i} + \theta_{1i}t + \gamma Z_i + \epsilon_{i}(t)\)
the event: \(\lambda_i(t)=\lambda_0(t) \exp\{\beta X_{i}(t) + \alpha Z_i\}\)
the time of event, e.g., the exponential distribution (Austin, 2012):
\[\begin{gather}
H_i(T_i)=\int_0^{T_i}\lambda_0(t) \exp(\beta X_i(t)+\alpha Z_i)dt , & \lambda_0(t)=\lambda\\
F_i(T_i)=1-exp(-H_i(T_i))=u , & u\sim\mathcal{U}(0, 1)
\end{gather}\] \[T_i=\frac{1}{\beta\theta_{1i}}\log\left(1-\frac{\beta\theta_{1i}\times \log(1-u)}{\lambda \exp(\beta\theta_{0i}+(\beta\gamma+\alpha)Z_i)}\right)\]
Scott et al. (2012)
Yaghootkar & Frayling (2013)
| Parameters | Values |
|---|---|
| Number of participants (\(n\)) | 4,352 |
| Number of measures (\(m\)) | 4 |
| Diabetes incidence rate (\(d\)) | 0.0384 |
| Minor allele frequency (\(f\)) | 0.244 |
| Random effects (\(\theta\)) | \(\sim\mathcal{N}_2\left (\begin{bmatrix}4.55\\0.0108\end{bmatrix} , \begin{bmatrix} 0.143 & -0.00109 \\ -0.00109 & 6.8\times 10^{-04} \end{bmatrix} \right )\) |
| SNP effect on \(Y_{ij}\) (\(\gamma\)) | 0.0229 |
| SNP effect on \(T_i\) (\(\alpha\)) | 0.265 (OR=1.3) |
| Association between \(Y_{ij}\) and \(T_i\) (\(\beta\)) | 3.17 |
| Error term (\(\epsilon\)) | \(\sim\mathcal{N}(0,0.305^2)\) |
Let's do some power calculation (chen_sample_2011):
\[\begin{gather}
H_0:\ \beta\gamma+\alpha=0 \\
d=\frac{(z_{\tilde{\beta}}+z_{1-\tilde{\alpha}})^2}{f(1-f)(\beta\gamma+\alpha)^2} \\
z_{\tilde{\beta}}=\pm\sqrt{df(1-f)(\beta\gamma+\alpha)^2}+z_{1-\tilde{\alpha}}
\end{gather}\]
\(f\), the allele frequency;
\(\Rightarrow\tilde{\beta}_{\tilde{\alpha}}=46.56\%\) for TCF7L2 (rs17747324).
| Sample Size |
mean (sd) per SNP in seconds |
100K SNPs in days |
mean (sd) per SNP in seconds |
100K SNPs in days |
|---|---|---|---|---|
| 500 | 51 (3.4) | 59 | 0.71 (0.066) | 0.82 |
| 2,500 | 100 (11) | 120 | 3.1 (0.092) | 3.6 |
| 5,000 | 180 (25) | 210 | 6.3 (0.17) | 7.3 |
| 10,000 | 340 (34) | 400 | 9 (0.22) | 10 |
Scott et al. (2012)
Yaghootkar & Frayling (2013)
|
rs10830963_G (MTNR1B) |
rs17747324_C (TCF7L2) |
|
|---|---|---|
| \(\alpha\) |
-0.4404 (\(p=9.37\times 10^{-04}\)) |
0.2652 (\(p=4.09\times 10^{-02}\)) |
| \(\beta\) |
3.2511 (\(p=3.63\times 10^{-42}\)) |
3.1703 (\(p=8.93\times 10^{-42}\)) |
| \(\gamma\) |
0.0991 (\(p=1.33\times 10^{-23}\)) |
0.0229 (\(p=3.02\times 10^{-02}\)) |
The Joint Model is better than the "Two-Step" approach
The "Two-Step" approach is not that bad, especially regarding computation time
Me?Austin, P. C. (2012). Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Statistics in Medicine, 31(29), 3946–3958. https://doi.org/10.1002/sim.5452
Chen, L. M., Ibrahim, J. G., & Chu, H. (2011). Sample size and power determination in joint modeling of longitudinal and survival data. Statistics in Medicine, 30(18), 2295–2309. https://doi.org/10.1002/sim.4263
Flannick, J., & Florez, J. C. (2016). Type 2 diabetes: Genetic data sharing to advance complex disease research. Nature Reviews Genetics, advance online publication. https://doi.org/10.1038/nrg.2016.56
Marullo, L., El-Sayed Moustafa, J. S., & Prokopenko, I. (2014). Insights into the Genetic Susceptibility to Type 2 Diabetes from Genome-Wide Association Studies of Glycaemic Traits. Current Diabetes Reports, 14(11). https://doi.org/10.1007/s11892-014-0551-8
Scott, R. A., Lagou, V., Welch, R. P., Wheeler, E., Montasser, M. E., Luan, J., … Barroso, I. (2012). Large-scale association analyses identify new loci influencing glycemic traits and provide insight into the underlying biological pathways. Nature Genetics, 44(9), 991–1005. https://doi.org/10.1038/ng.2385
Tsiatis, A. A., DeGruttola, V., & Wulfsohn, M. S. (1995). Modeling the Relationship of Survival to Longitudinal Data Measured with Error. Applications to Survival and CD4 Counts in Patients with AIDS. Journal of the American Statistical Association, 90(429), 27–37. https://doi.org/10.2307/2291126
Yaghootkar, H., & Frayling, T. M. (2013). Recent progress in the use of genetics to understand links between type 2 diabetes and related metabolic traits. Genome Biology, 14(3), 203.